Investigating Bilingual Deep Neural Networks for Automatic Recognition of Code-switching Frisian Speech
نویسندگان
چکیده
In this paper, a code-switching automatic speech recognition (ASR) system built for the Frisian language is described. Frisian is mostly spoken in the province Fryslân which is located in the north of the Netherlands. The native speakers of Frisian are mostly bilingual and often code-switch in daily conversations due to the extensive influence of the Dutch language. In the scope of the FAME! Project, the influence of this unforeseen language switching on modern ASR systems will be investigated with the objective of building a robust recognizer that can handle this phenomenon. For this purpose, in this work, we design a bilingual deep neural network (DNN)-based ASR system and investigate the impact of bilingual DNN training in the context of code-switching speech. c © 2016 The Authors. Published by Elsevier B.V. Peer-review under responsibility of the Organizing Committee of SLTU 2016.
منابع مشابه
Exploiting Untranscribed Broadcast Data for Improved Code-Switching Detection
We have recently presented an automatic speech recognition (ASR) system operating on Frisian-Dutch code-switched speech. This type of speech requires careful handling of unexpected language switches that may occur in a single utterance. In this paper, we extend this work by using some raw broadcast data to improve multilingually trained deep neural networks (DNN) that have been trained on 11.5 ...
متن کاملOpen Source Speech and Language Resources for Frisian
In this paper, we present several open source speech and language resources for the under-resourced Frisian language. Frisian is mostly spoken in the province of Fryslân which is located in the north of the Netherlands. The native speakers of Frisian are Frisian-Dutch bilingual and often code-switch in daily conversations. The resources presented in this paper include a code-switching speech da...
متن کاملA Longitudinal Bilingual Frisian-Dutch Radio Broadcast Database Designed for Code-Switching Research
We present a new speech database containing 18.5 hours of annotated radio broadcasts in the Frisian language. Frisian is mostly spoken in the province Fryslân and it is the second official language of the Netherlands. The recordings are collected from the archives of Omrop Fryslân, the regional public broadcaster of the province Fryslân. The database covers almost a 50-year time span. The nativ...
متن کاملDevelopment of bilingual ASR system for MediaParl corpus
The development of an Automatic Speech Recognition (ASR) system for the bilingual MediaParl corpus is challenging for several reasons: (1) reverberant recordings, (2) accented speech, and (3) no prior information about the language. In that context, we employ frequency domain linear prediction-based (FDLP) features to reduce the effect of reverberation, exploit bilingual deep neural networks ap...
متن کاملAddressing Code-Switching in French/Algerian Arabic Speech
This study focuses on code-switching (CS) in French/Algerian Arabic bilingual communities and investigates how speech technologies, such as automatic data partitioning, language identification and automatic speech recognition (ASR) can serve to analyze and classify this type of bilingual speech. A preliminary study carried out using a corpus of Maghrebian broadcast data revealed a relatively hi...
متن کامل